The Effects of Pruning Methods on the Predictive Accuracy of Induced Decision Trees

نویسندگان

  • FLORIANA ESPOSITO
  • DONATO MALERBA
  • GIOVANNI SEMERARO
  • VALENTINA TAMMA
چکیده

Several methods have been proposed in the literature for decision tree (post)-pruning. This article presents a unifying framework according to which any pruning method can be de"ned as a four-tuple (Space, Operators, Evaluation function, Search strategy), and the pruning process can be cast as an optimization problem. Six well-known pruning methods are investigated by means of this framework and their common aspects, strengths and weaknesses are described. Furthermore, a new empirical analysis of the e!ect of post-pruning on both the predictive accuracy and the size od induced decision trees is reported. The experimental comparison of the pruning methods involves 14 datasets and is based on the cross-validation procedure. The results con"rm most of the conclusions drawn in a previous comparison based on the holdout procedure. Copyright ( 1999 John Wiley & Sons, Ltd.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data

Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...

متن کامل

Predicting The Type of Malaria Using Classification and Regression Decision Trees

Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...

متن کامل

Study of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA

Classification is important problem in data mining. Given a data set, classifier generates meaningful description for each class. Decision trees are most effective and widely used classification methods. There are several algorithms for induction of decision trees. These trees are first induced and then prune subtrees with subsequent pruning phase to improve accuracy and prevent overfitting. In...

متن کامل

An Efficient Predictive Model for Probability of Genetic Diseases Transmission Using a Combined Model

In this article, a new combined approach of a decision tree and clustering is presented to predict the transmission of genetic diseases. In this article, the performance of these algorithms is compared for more accurate prediction of disease transmission under the same condition and based on a series of measures like the positive predictive value, negative predictive value, accuracy, sensitivit...

متن کامل

Application of classification trees-J48 to model the presence of roach (Rutilus rutilus) in rivers

In the present study, classification trees (CTs-J48 algorithm) were used to study the occurrence of roach in rivers in Flanders (Belgium). The presence/absence of roach was modelled based on a set of river characteristics. The predictive performance of the CTs models was assessed based on the percentage of Correctly Classified Instances (CCI) and Cohen's kappa statistics. To find the best model...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999